The set of all items sold in a market ...for e.g., in Walmart.
A subset of items.
Individual sets of purchases made by customers.
The number of baskets containing all items in itemset I
Given a support threshold s, sets of items that appear in atleast s baskets are called frequent itemsets
In [2]:
# Finding frequent Itemsets - the naive method
items = {'m' , 'c', 'p', 'b' , 'j'} # abbreviated item names
support = 3 # the support threshold
baskets = [{'m','c','b'}, {'m','p','j'}, {'m','b'}, {'c','j'}, {'m','p','b'}, {'m','c','b','j'}, {'c','b','j'}, {'b','c'}]
# todo: code to find frequent itemsets
In [6]:
# 100,000 items = 5 billion pairs
no_of_items = 100000
no_of_pairs = no_of_items * (no_of_items - 1) / 2 # n choose 2
print no_of_pairs
# 4 bytes per count = 20 gb of memory
memory_in_bytes = 4
total_memory_reqd_bytes = no_of_pairs * memory_in_bytes
# 1 GB = 1,000 MB = 1,000,000 KB = 1,000,000,000 Bytes
total_memory_reqd_gb = total_memory_reqd_bytes / (1000 * 1000 * 1000)
print 'Memory Required (GB):', total_memory_reqd_gb
In [ ]: